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(54) ARITHMETIC UNIT AND ARITHMETIC METHOD 



(57) An arithmetic and logic unit (ALU) 330, a shift 
processing unit (SHY) 340 and a register unit (REG) 
350 each caused to be of structure divided into, e.g., 
four sections can mutually transfer data through buses 
(BUS) 360, 370, 380 of 64 bits. Data of plural fields 
within word subject to operation inputted to the ALU 330 
are exchanged as occasion demands by data exchange 
units EXCs 310, 320 provided between the buses 360, 
370 and the ALU 330. Thus, operation function 
betweenn plural fields within the same word subject to 
operation can be realized by the number of steps lesser 
than that of the prior art. 



63 48 47 32 31 1 6 15 0 

A I a 0 I at I a 2 I a 3 



63 48 47 32 31 1 6 15 0 

B | b 0 | b, I b 2 I bj~1 

FIG.10 



to 
in 

o 

CO 
O) 



Q_ 

LU 



BNSDOCID: <EP 0930564A1_I_> 



by Xerox (UK) Business Services 
2.16.7/3.6 



/ 



f 



EP 0 930 564 A1 

Description 

Technical Field 

5 [0001] This invention relates to an arithmetic apparatus and an arithmetic method for carrying out arithmetic and log- 
ical operation using CPU. 

Background Art 

10 [0002] Among CPUs (Central Processing Units) which are arithmetic units (arithmetic and logic units) used for com- 
puter, etc., there are some arithmetic units having a group of instructions called multimedia instruction (hereinafter 
referred to as MM instruction or simply referred to as instruction). This MM instruction serves to divide area of arithmetic 
(computing) element that CPU has to execute plural operations at the same time. 

[0003] An example of the configuration of a conventional CPU is shown in FIG. 1 . This conventional CPU comprises 
is an arithmetic and logic unit (ALU) 130 serving as arithmetic and logic means for executing data processing, a shift 
processing unit (SHT) 140 serving as shift processing means for shifting data in left and right directions, and a register 
unit (REG) 150 such as accumulator, etc., wherein those units are connected to, e.g., buses 160, 170, 1 80 of 64 bits to 
mutually transfer data. 

[0004] FIG. 2 shows multiplication by multiplier of 64 bits x 64 bits in the above-described conventional CPU. Namely, 
so word s*t of 128 bits, which is product of word s of 64 bits of register A and word t of 64 bits of register B, is generated 
and is stored into register C. 

[0005] FIG. 3 shows the state where the above-mentioned 64 bit words s and t are respectively divided into four fields 
to form respective four bit fields to carry out multiplication of bits of ack (acknowledge) fields, i.e., 16 bits x 16 bits. 
Namely, s0*t0. s1 *t1 , s2*t2 and s3*t3 respectively consisting of 32 bits which are products of respective 1 6 bits sO, s1 , 
25 s2, s3 of the register A and respective 1 6 bits to, t1 . t2, t3 of the register B are generated and are stored into the register 
C. 

[0006] Such four parallel multiplication can be realized by quartering the multiplier that CPU has to constitute multi- 
pliers of four parallel. In addition, similarly to the above, adder that CPU has may be also quartered to constitute four 
parallel adders. 

30 [0007] FIG. 4 shows addition by adder of 128 bits + 128 bits in the above-described conventional CPU. Namely, 128 
bits s+t which are sum of respective 32 bits s of register A and respective 32 bits t of register B are generated and are 
stored into register C. 

[0008] FIG. 5 shows the state where the above-mentioned respective words are quartered to carry out additions of 
respective 32 bits + respective 32 bits. Namely, sO+tO, s1+t1 , s2+t2 and s3+t3 respectively consisting of 32 bits which 
35 are sums of respective 32 bits s0. s1 , s2, s3 of the register A and respective 32 bits to, t1 , t2, t3 of the register B are 
generated and are stored into the register C. 

[0009] When data width subject to operation is about 1 6 bits or 32 bits as stated above, if parallel arithmetic (comput- 
ing) elements constituted by dividing single arithmetic (computing) element are used, it is possible to carry out arithme- 
tic processing at a high speed. Instructions for carrying out parallel operation shown in FIGS. 3 and 5 are a portion of 
40 multi-media (MM) instructions used therefor. 

[0010] A more practical example of conventional parallel operation using MM instruction is indicated below. 
[0011] Initially, explanation will be given in connection with the case where n simultaneous linear equations as indi- 
cated by the following equation (1) are solved by using the Cramer's formula. 

« a oo x o + a oi x i + • • • +a 0n X n = b 0 (1) 

a 10 X 0 +a 11 X 1 + • • • +a m x n= b v* 

a nO X 0 + a M X l + * • # + a nn X n sb n 

SO 
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[001 2] When this Cramer's formula is used, j-th columns of nxn determinant are replaced in order as indicated by the 
above-mentioned equation (2), thereby making it possible to obtain solutions of the simultaneous linear equations of the 
25 equation (1). Namely, if the determinant can be calculated, it is possible to solve the simultaneous linear equations. 
[001 3] In general, the nxn determinant is expanded as indicated by the equation (3) by using small determinant having 
degree lower than n. In this case, Aij is expression in which sign given by (-1)'^ is attached to the representation 
obtained by removing, from the nxn determinant, the i-th row and the j-th column thereof. 
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[0014] Namely, if small determinants having lower degree are calculated in order, the original determinant can be cal- 
culated. Accordingly, if 2x2 determinant which is the determinant of the lowest degree can be calculated, determinant 
of arbitrary degree can be similarly calculated. In order to calculate 2x2 determinant, it is sufficient to use expansion 
indicated by the equation (4). 
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[0030] FIG. 8 shows the state where two three-dimensional vectors (aOO, a01 , a02), (aio, a11. a12) are stored in reg- 
isters AO, A1 as two words respectively consisting of 64 bits. Explanation will be given below in connection with the pro- 
cedure for calculating outer product by using the conventional MM instructions with respect to two three-dimensional 
vectors stored in this way. 

is [0031] FIG. 9 shows the procedure for calculating outer product by using the conventional MM instructions with 
respect to two three-dimensional vectors of FIG. 8. 

[0032] Initially, row vector stored in the register AO is shifted by 1 6 bits to the right by instruction "SRL B. AO, 1 6", and 
is stored into register B. 

[0033] Then, row vector stored in the register AO is shifted by 32 bits to the left by instruction "SLL C, AO, 32\ and is 
20 stored into register C. 

[0034] Then, sum (OR) of data stored in the register B and data stored in the register C is generated by instruction 
"OR D, B, C", and is stored into register D. Thus, a01, a02, aOO, a01 respectively consisting of 16 bits are stored into 
the register D. 

[0035] Then, row vector stored in the register A1 is shifted by 16 bits to the left by instruction "SLL E, A1 , 16", and is 
25 stored into register E. 

[0036] Then, row vector stored in the register A1 is shifted by 32 bits to the right by instruction "SRL F, A1 , 32", and 
is stored into the register F. 

[0037] Then, sum (OR) of data stored in the register E and data stored in the register F is generated by instruction 
"OR G, E, P, and is stored into register G. Thus, a10, a11, a12, a10 respectively consisting of 16 bits are stored into 
30 the register G. 

[0038] Then, data stored in the register D and data stored in the register G are multiplied in parallel by instruction 
"PMUL H, D, G", and its result is stored into register H. Namely. a01*a10. a02*a1 1 , a00*a12, a01 *a1 0 respectively con- 
sisting of 32 bits are stored into register H. 

[0039] Then, row vector stored in the register AO is shifted by 1 6 bits to the left by instruction "SLL B, AO, 1 6". and is 
35 stored into register B. 

[0040] Then, row vector stored in the register AO is shifted by 32 bits to the right by instruction "SRL C, AO, 32", and 
is stored into register C. 

[0041] Then, sum (OR) of data stored in the register B and data stored in the register C is generated by instruction 
"OR D, B, C", are! is stored into register 0. Thus, aOO. aOl. a02, aOO respectively consisting of 16 bits are stored into 
40 the register D. 

[0042] Then, row vector stored in the register A1 is shifted by 16 bits to the right by instruction "SRL E, A1 , 16", and 
is stored into register E. 

[0043] Then, row vector stored in the register A1 is shifted by 32 bits to the left by instruction "SLL F, A1 , 32", and is 
stored into register F. 

45 [0044] Then, sum (OR) of data stored in the register E and data stored in the register F is generated by instruction 
"ORG, E, F", and is stored into register G. Thus. a11, a12, alO, a11 respectively consisting of 16 bits are stored into 
the register G. 

[0045] Then, data stored in the register D and data stored in the register G are multiplied in parallel by instruction 
"PMUL J. D, G", and its result is stored into register J. Namely, a00*a11, a01*a12, a02*a10, a00*a1 1 respectively con- 
50 sisting of 32 bits are stored into the register J. 

[0046] Then, data stored in the register H is subtracted in parallel from data stored in the register J by instruction 
"PSUB K. J, H". and its result is stored into register K. Namely, a00*a11 - a01*a10. a01*a12 - a02*a11, a02*a10 - 
a00*al 2, a00*a1 1 - a1 0*a1 0 respectively consisting of 32 bits are stored into the register K. 

[0047] As stated above, in the case where the conventional MM instructions are used to calculate outer product of 
55 two three-dimensional vectors, the above-mentioned 15 steps were required. 

[0048] Explanation will now be given in connection with the case of calculating inner product of two vectors as a third 

more practical example in which the conventional MM instructions are used to carry out parallel operation. 

[0049] Inner product of two vectors represents degree of correlation therebetween. As such inner product of two vec- 
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tors, inner product of, e.g., two four-dimensional vectors is given by the equation (7). 

(a 0 a 1 a 2 a 3 r(b 0 b 1 b 2 b3) = 30^0 + 3/5! +a 2 *b 2 + a 3 *b 3 (7) 

5 [0050] FIG. 10 shows the state where two four-dimensional vectors (aO, a1, a2, a3), (bO, b1 . b2, b3) of 64 bit word 
are respectively stored into registers A, B as two words. Explanation will be given below in connection with the proce- 
dure for calculating inner product by using conventional MM instructions with respect to two four-dimensional vectors 
stored in this way. 

[0051] FIG. 1 1 shows procedure for calculating inner product by using conventional MM instructions with respect to 
io two four-dimensional vectors of FIG. 10. In this example,- the portions labeled mark x indicate that values irrelevant to 
this operation are stored. 

[0052] Initially, data stored in register A and data stored in register B are multiplied in parallel by instruction "PMUL C, 
A, B", and its result is stored into register C. Namely, a0*b0, al *bl , a2*b2, a3*b3 respectively consisting of 16 bits are 
stored into the register C. 

15 [0053] Then, data stored in the register C is shifted by 1 6 bits to the left by instruction "SLL D, C, 16", and is stored 
into register D. 

[0054] Then, data stored in the register C and data stored in the register D are added in parallel by instruction "PADD 
E, C, D", and its result is stored into register E. Thus, within the register E, a2*b2 + a3*b3 of 16 bits are stored into bit 
1 6 to bit 31 , and a0*b0 + a1 *b1 of 1 6 bits are stored into bit 48 to bit 63. 
20 [0055] Then, data stored in the register E is shifted by 32 bits to the left by instruction "SLL F, E, 32", and is stored 
into register F. Thus, within the register F, only a2*b2 + a3*b3 are stored into the Most Significant 16 bits, and two data 
values of 1 6 bits of low order both become equal to zero. 

[0056] Then, data stored in the register E and data stored in the register F are added in parallel by instruction "PADD 
G, E, F\ and its result is stored into register G. Thus, within the register G, a0*b0 + a1 *b1 + a2*b2 + a3*b3 are stored 
25 into the Most Significant 16 bits, and a 2 *t>2 + a 3 *fc>3 are stored into bit 16 to bit 31 . 

[0057] As stated above, in the case where the conventional MM instructions are used to calculate inner product of two 
four-dimensional vectors, the above-mentioned 5 steps were required. 

[0058] Meanwhile, in the arithmetic apparatus and the arithmetic method using conventional MM instructions, while 
data of plural fields of n bits are stored in the register, operation is performed only between the same (corresponding) 
30 bit fields of these fields. Namely, since arithmetic operation cannot be directly implemented between fields within word 
subject to operation consisting of plural fields, there took place necessity of carrying out extra field operation for per- 
forming operation between desired fields in carrying out parallel operation as described above, thus failing to cause the 
operation speed to be sufficiently high. 

35 Disclosure of the Invention 

[0059] This invention has been made in view of the above-described problems and its object is to provide an arithme- 
tic apparatus and an arithmetic method which can perform parallel operation at a high speed with the number of steps 
lesser than that of the conventional arithmetic apparatus. 

40 [0060] The arithmetic apparatus according to this invention comprises arithmetic and logic means for performing arith- 
metic and logical operation with respect to word subject to operation constituted with plural fields consisting of M bits 
(M > 1), shift processing means for implementing shift operation by a predetermined number of bits with respect to the 
word subject to operation, and a register for storing the word subject to operation and word in which the operation has 
been earned out, and has a function to perform parallel operation between the plural fields within the same word subject 

45 t operation. 

[0061 ] In addition, an arithmetic method according to this invention is directed to an arithmetic method for performing 
arithmetic and logical operation in field units with respect to word subject to operation constituted with plural fields con- 
sisting of M bits, wherein the method includes a step of exchanging two fields or more within the same word subject to 
operation. 

so [0062] In accordance with the arithmetic apparatus and the arithmetic method as stated above, since there is no 
necessity of carrying out extra field operation, it is possible to perform parallel operation at a high speed by the number 
of steps lesser than that of the prior art 

Brief Description of the Drawings 

55 

[0063] 

FIG. 1 is a view showing an example of the configuration of conventional CPU. 
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FIG. 2 is a view for explaining multiplication by multiplier of 64 bits x 64 bits. 
FIG. 3 is a view for explaining parallel multiplication by quartered multiplier of 64 bits x 64 bits. 
5 FIG. 4 is a view for explaining addition by adder of 64 bits x 64 bits. 

FIG. 5 is a view for explaining parallel addition by quartered adder of 64 bits x 64 bits. 

FIG. 6 is a view showing the state where row vectors of 3x3 matrix are stored in registers respectively as 64 bit 
io words. 

FIG. 7 is a view showing the procedure for calculating small determinant of 2x2 by using conventional MM instruc- 
tions with respect to row vectors of 3x3 matrix. 

is FIG. 8 is a view showing the state where two three-dimensional vectors are stored in registers respectively as 
words of 64 bits. 

FIG. 9 is a view showing the procedure for calculating outer product by using conventional MM instructions with 
respect to two three-dimensional vectors. 

20- 

FIG. 10 is a view showing the state where two four-dimensional vectors are stored in registers respectively as two 
words. 

FIG. 1 1 is a view showing the procedure for calculating inner product by using conventional MM instructions with 
25 respect to two four-dimensional vectors. 

FIG. 12 is a view showing an example of the configuration of CPU which is one form of an arithmetic apparatus of 
this invention. 

30 FIG. 13 is a view showing an example of fundamental configuration of CPU having MM instructions. 
FIGS. 14A, B, C are views for explaining instructions "PMUL" and "PADD". 

FIGS. 15A to E are views for explaining MM instructions of the arithmetic apparatus of this invention. 

35 

FIG. 16 is a view showing an example of the configuration of data exchange unit (EXC circuit). 

FIG. 17 is a view for explaining multiplexer (MUX) of the EXC circuit. 

40 FIG. 18 is a view showing two commands sent to the MUX and the operation thereof. 

FIG. 19 is a view showing correspondence relationship between EXC command sent to the EXC circuit and MM 
instruction to be realized. 

45 FIG. 20 is a view showing a circuit for realizing instruction "PEXC". 

FIG. 21 is a view showing a circuit for realizing instruction "PEXH*. 

FIG. 22 is a view showing a circuit for realizing instruction "PROT3". 

so 

FIG. 23 is a view showing a circuit for realizing instruction "PHADD". 

FIG. 24 is a view showing a circuit for realizing instruction "PHSUB". 

55 FIG. 25 is a view showing procedure for calculating, by the arithmetic apparatus of this invention, small determinant 
of 2x2 with respect to row vectors of 3x3 matrix respectively stored. 

FIG. 26 is a view showing procedure for calculating outer product with respect to two three-dimensional vectors by 
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the arithmetic apparatus of this invention. 

FIG. 27 is a view showing procedure for calculating inner product with respect to two four-dimensional vectors by 
the arithmetic apparatus of this invention. 

5 

FIG. 28 is a block diagram showing an example of the configuration of a picture preparation apparatus to which the 
arithmetic unit (apparatus) according to this invention is applied. 

Best Mode for Carrying Out the Invention 

10 

[0064] Preferred embodiments of an arithmetic apparatus and an arithmetic method of this invention will be described 
below with reference to the attached drawings. In the following description, the configuration of the embodiment of the 
arithmetic apparatus of this invention will be first explained and the embodiment of the arithmetic method of this inven- 
tion will be explained with reference to that configuration. 

is [0065] FIG. 12 shows an example of the conf iguration of the main part of CPU as one embodiment of the arithmetic 
apparatus of this invention. This CPU is of the configuration comprising an arithmetic and logic unit (ALU) 330 serving 
as arithmetic and logic means, a shift processing unit (SHT) 340, and a register unit (REG) 350, wherein these units 
can mutually transfer data through buses (BUS) 360, 370, 380 of 64 bits and parallel buses of 16 bits. The above-men- 
tioned ALU 330, the SHT 340 and the REG 350 are of the configuration in which they are respectively divided into four 

20 sections. 

[0066] While the above-mentioned respective components have configuration similar to the respective portions of the 
CPU shown in FIG. 13, the former differs from the latter in that the former comprises data exchange units (EXC) 310, 
320 serving as bit field exchange means within word. Namely, by the EXCs 310, 320 serving as bit field exchange 
means within this word, there is realized arithmetic (computational) function to perform operation between plural fields 
25 within the same word subject to operation at the ALU 330. In this example, one field consists of M bits (M ^ 1). In the 
embodiment described below, one field is caused to be. e.g., 16 bits. 

[0067] Prior to explanation with respect to new MM instructions that the above-described arithmetic apparatus of this 
invention has. "PMUL" and "PADD" which are previously described MM instructions will be described for a second time 
with reference to an example of the configuration of the Central Processing unit (CPU) which is essential point of the 

30 arithmetic unit (apparatus) of this invention. 

[0068] FIG. 13 shows an example of the fundamental configuration of CPU having MM instructions. This example of 
the configuration of the CPU having MM instructions is based on the example of the configuration of the conventional 
CPU which has not new MM instructions shown in FIG. 1 , but differs from the latter in that an ALU 230, a SHT 240 and 
a REG 250 are respectively divided into four sections. 

35 [0069] Further, as data transfer path between a bus 260 and the ALU 230, four 16 bit parallel transfer paths 265 are 
provided in place of parallel transfer paths of 64 bits. 

[0070] FIG. 14A to C show MM instructions "PMUL" and "PADD" executed at the arithmetic unit of FIG. 13. 
[0071] FIG. 14A shows the state where data of respective 16 bits are respectively stored into quartered respective 
fields of 1 6 bits of 64 bit registers A, B of the REG 250. 
40 [0072] FIG. 14B shows the state where four data individually stored in four fields of the register A and four data stored 
in the register B are multiplied in parallel at the ALU 230 by instruction "PMUL C, A, B", and products respectively con- 
sisting of 32 bits are stored into register C of the REG 250. 

[0073] Moreover, FIG. 14C shows the state where four data stored in the register A and four data stored in the register 
B are added in parallel by instruction "PADD C, A, B" and sums respectively consisting of 16 bits are stored into register 
45 C. 

[0074] However, operations by MM instructions as described above in the arithmetic unit of FIG. 13 are carried out in 
word units, and the number of steps was additionally required for the purpose of performing operation in field units. In 
view of the above, the arithmetic unit of this invention is caused to be of the configuration further including "bit field 
exchange instruction within the word" and "operation instruction between data within the word" which are new MM 
so instructions and adapted for performing operation by lesser number of steps. 

[0075] The group of MM instructions of the arithmetic unit (apparatus) of this invention will now be described with ref- 
erence to FIG. 15A to E. 

[0076] FIG. 15A shows instruction "PEXC". Namely, the instruction "PEXC 19, A" serves to exchange, in the state 
where data of the Most Significant field and data of the Least Significant field of quartered register A are caused to 
55 remain as they are, data of two fields at the central portion therebetween to store them into register B. 

[0077] FIG. 15B shows instruction "PEXH". Namely, the instruction "PEXH B, A" serves to exchange each other 
respective data of two fields of high order of the quartered register A, and to exchange each other respective data of 
two fields of low order thereof to store them into the register B. 
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[0078] FIG. 15C shows instruction "PROT3". Namely, the instruction "PROT3 B. A, 16" serves to allow data of the 
Most Significant field of the quartered register A to remain as it is and to shift, by 1 6 bits, respective data of three fields 
of low order to allow such data to undergo rotation to store them into the register B. 

[0079] FIG. 1 5D shows instruction "PHADD". Namely, the instruction "PHADD B, A" serves to add each other respec- 
5 tive data of two fields of high order of the quartered register A, and to add each other respective data of two fields of 
low order thereof to store them into the register B. 

[0080] FIG. 15E shows instruction "PHSUB". Namely, the instruction "PHSUB B, A" serves to allow respective data 
of two fields of high order of the quartered register A to undergo subtractive processing, and to allow respective data of 
two fields of low order to undergo subtractive processing to store them into the register B. 

10 [0081 ] As stated above, the arithmetic unit (apparatus) of this invention further has, in addition to the conventional MM 
instructions, instruction for carrying out exchange between divided bit fields and instruction for performing operation 
between different bit fields within the same register to thereby improve operation performance. 
[0082] The configuration of the arithmetic unit (apparatus) of this invention further having new MM instructions as 
described above in addition to the conventional MM instructions will now be described in more practical sense. 

is [0083] FIG. 16 shows an example of the conf iguration of data exchange unit (EXC circuit) 310 of FIG. 12. Respective 
inputs AO to A3 to this EXC circuit 310 are delivered to respective multiplexers (MUXs) 311 to314. Further, the respec- 
tive MUXs select data to be outputted by respective two commands delivered thereto. Thus, the operation of the EXC 
31 0 is controlled by commands CO to C7. 

[0084] It is to be noted that while only the EXC 310 has been described here, the operation similarly applies to the 
20 EXC circuit 320. 

[0085] The MUXs 31 1 to 314 of the above-described EXC circuits 310, 320 will now be described. These MUXs have 
configuration of four inputs and one output, and their operations are controlled by respective two commands. 
[0086] FIG. 17 shows MUX 31 1 among the above-mentioned MUXs 31 1 to 314. This MUX 31 1 has configuration of 
four inputs and one output, and its operation is controlled by two commands CO, C1 . 
25 [0087] FIG. 1 8 shows correspondence relationship between two commands sent to the MUX 31 1 and the operation. 
Namely, when commands CO. C1 are both 0, input AO is caused to be output BO. Moreover, when CO is 0 and C1 is 1 , 
input A1 is caused to be B0. Similarly, when CO is 1 and C1 is 0. input A2 is caused to be output B0. In addition, when 
CO. C1 are both 1 , input A3 is caused to be output B0. 

[0088] It is to be noted that while the MUX 31 1 has been described here, the operation similarly applies to MUXs 312 
30 to 314. Namely, the operation of the MUX 312 is controlled by commands C2, C3, the operation of the MUX 313 is con- 
trolled by commands C4, C5, and the operation of the MUX 314 is controlled by commands C6, C7. 
[0089] FIG. 19 shows the correspondence relationship between EXC commands CO to C7 sent to the EXC circuit 
shown in FIG. 16 and MM instructions realized by these commands. Namely, 

35 When CO, C1. C3 and C4 are 0, and C2, C5, C6 and C7 are 1, instruction "PEXC" is realized. 

When CO, C2, C3 and C7 are 0, and C1 , C4, C5 and C6 are 1 , instruction "PEXH" is realized. 

When CO, CI , C4 and C7 are 0, and C2, C3, C5 and C6 are 1 , instruction "PROT3" is realized. 

40 

When CO, C2, C3 and C7 are 0, and C1.C4.C5 and C6 are 1 , instruction "PHADD" is realized. 

When CO. C2. C3 and C7 are 0, and C1 . C4, C5 and C6 are 1, instruction "PHSUB" is realized. 

45 [0090] It is to be noted that the above-mentioned instructions "PHADD" and "PHSUB" are the same with respect to 
the EXC instruction, but are different in command of ALU. 

[0091] Explanation will now be given in more practical sense in connection with a circuit for realizing new MM instruc- 
tions that the above-described arithmetic unit (apparatus) of this invention has. In the following description, aO to a3 are 
input data respectively having data width of 16 bits or 32 bits, and constitute one word as a whole. In addition, bO to b3 

so are output data respectively having data width of 16 bits or 32 bits, and constitute one word as a whole. 

[0092] FIG. 20 shows a circuit for realizing instruction "PEXC". This circuit is of the configuration comprising exchange 
circuit "exchange". With respect to four data aO, a1 . a2, a3 inputted to this circuit, the Most Significant data aO and the 
Least Significant data a3 are respectively outputted as bO and b3 as they are. In addition, two data between the Most 
Significant data and the Least Significant data are exchanged each other, and they are outputted in the state where a1 

55 is caused to be b2 and a2 is caused to be b1 . 

[0093] FIG. 21 shows a circuit for realizing instruction "PEXH". This circuit is of the configuration comprising two 
exchange circuits "exchange". Two data aO, a1 of high order of four data aO, a1, a2, a3 inputted to this circuit are 
exchanged each other, and they are outputted in the state where aO is caused to be b1 and a1 is caused to be bO. In 
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addition, two data a2, a3 of low order of the above-mentioned inputted four data are exchanged each other, and they 
are outputted in the state where a2 is caused to be b3 and a3 is caused to be b2. 

[0094] FIG. 22 shows a circuit for realizing instruction "PROT3". In this example, "SELECT is a selector circuit. The 
Most Significant data aO of four data aO, a1, a2, a3 inputted to this circuit is outputted in the state caused to be bO as it 

s is. In addition, other three data a1 , a2. a3 are outputted in the state where, e.g., a1 is caused to be b3, a2 is caused to 
be b1 and a3 is caused to be b2 by the selector circuit "select" of three inputs and one output. Namely, the above-men- 
tioned three data except for the Most Significant data aO are outputted after undergone rotation. 
[0095] FIG. 23 shows a circuit for realizing instruction "PHADD". This circuit is of the configuration comprising two 
adding circuits "ADD". Two data aO, a1 of high order of four data aO, a1 , a2, a3 inputted to this circuit are added to each 

10 other, and are outputted in the state caused to be bO. In addition, two data a2, a3 of low order of the above-mentioned 
inputted four data are exchanged each other, and are outputted in the state caused to be b2. 

[0096] FIG. 24 shows a circuit for realizing instruction "PHSUB". This circuit is of the configuration comprising two 
subtracting circuits "SUB". Two data aO and a1 off high order of four data aO, al, a2, a3 inputted to this circuit are out- 
putted in the state where data al is subtracted from data aO so that its difference is caused to be bO. In addition, two 
15 data a2 and a3 of low order of the inputted four data are outputted in the state where data a3 is subtracted from data 
a2 so that its difference is caused to be b2. 

[0097] Explanation will now be given in connection with the case where operation is performed by the arithmetic unit 
(apparatus) of this invention having a function to carry out exchange an/or operation (computation) of different bit fields 
within the same word as previously described. 
20 [0098] FIG. 25 shows the procedure for calculating small determinant of 2x2 with respect to row vectors of 3x3 matrix 
by using the arithmetic unit (apparatus) of this invention. 

[0099] Initially, by the previously described instruction "PEXH D, A1 two data of high order of the quartered register 
A1 are exchanged each other, and two data of low order thereof are exchanged each other thus to store them into reg- 
ister D. 

25 [0100] Then, parallel multiplication of row vector stored in register AO and data stored in the register D is carried out 
in 16 bit units by instruction "PMULH E, AO, D", and its result is stored into register E. This instruction "PMULH" is 
instruction for carrying out operation similar to the previously described instruction "PMUL" with only half of the word 
length being as unit. Thus. a01 *a12 is stored into 32 bits of high order of the register E and a02*a1 1 is stored into 32 
bits of low order thereof 

30 [0101] Then, parallel subtraction to subtract, from data of high order stored in the register E, data of low order stored 
in the register E is carried out, in 32 bit units, by instruction "PSUBW G, E", and its result is stored into register G. This 
instruction "PSUBW" is instruction for carrying out operation similar to operation "PSUB" with word length being as unit. 
Thus. 0 is stored into 32 bits of high order of the register G and a01 *a12 - a02*a1 1 is stored into 32 bits of low order. 
[0102] As stated above, in order to calculate the 2x2 determinant, with the conventional arithmetic unit (apparatus), 

35 9 steps were required as shown in FIG. 7. On the contrary, in accordance with the arithmetic unit (apparatus) of this 
invention, such calculation can be performed only by the above-mentioned three steps. 

[0103] FIG. 26 shows the procedure for calculating outer product of two three-dimensional vectors by the arithmetic 
unit (apparatus) of this invention. 

[0104] Initially, by instruction "PROT3 B, AO. 16", the Most Significant data of register AO is caused to be as it is and 
40 three data of low order are shifted by 1 6 bits to allow those data to undergo rotation to store them into register B. 

[0105] Then, by instruction "PROT3 C, A1, 32", the Most Significant data of register A1 is caused to be as it is and 
three data of low order are shifted only by 32 bits to allow those data to undergo rotation to store them into register C. 
[01 06] Then, by instruction "PMUL D, B, C", parallel multiplication of row vector stored in the register B and data stored 
in the register C is carried out. The result thus obtained is stored into register D. Namely, 0 is stored into the Most Sig- 
45 nificant 32 bits of the register D, and a02*a1 1 , a00*a12 and a01 *a10 are stored in order into subsequent (succeeding) 
respective 32 bits. 

[01 07] Then, by instruction "P ROT3 B, AO, 32 n , the Most Significant data of the register AO is caused to be as it is and 
three data of low order are shifted only by 32 bits to allow those data to undergo rotation to store them into the register 
B. 

so [01 08] Then, by instruction "PROT3 C t A1 , 1 6", the Most Significant data of the register A1 is caused to be as it is and 
three data of low order are shifted only by 16 bits to allow those data to undergo rotation to store them into the register 
C. 

[0109] Then, by instruction "PMUL E, B, C", parallel multiplication of data stored in the register B and data stored in 
the register C is carried out. The result thus obtained is stored into the register E. Namely. 0 is stored into the Most Sig- 
55 nificant 32 bits of the register E and a01 *a12. a02*a10 and a00*a1 1 are stored in order into subsequent (succeeding) 
resp ctive 32 bits. 

[0110] Then, by instruction "PSUB F, E, D", parallel subtraction to subtract, from data stored in the register E, data 
stored in the register D is carried out The result thus obtained is stored into register F. Namely, 0 is stored into the Most 
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Significant 32 bits of the register F and a01 *a1 2 - a02*a1 1 , a02*al 0 - a00*a1 2, aOQ*a1 1 - a01 *a1 0 are stored into sub- 
sequent (succeeding) respective 32 bits. 

[01 1 1 ] As stated above, in order to calculate outer product of two three-dimensional vectors, 1 5 steps were required 
as shown in FIG. 9 in the conventional arithmetic apparatus. On the contrary, in accordance with the arithmetic unit 
5 (apparatus) of this invention, such calculation can be performed only by the seven steps. 

[0112] FIG. 27 shows the procedure for calculating inner product of two four-dimensional vectors by the arithmetic 
unit (apparatus) of this invention. 

[01 1 3] Initially, by instruction "PMUL C, A, B", parallel multiplication of data stored in register A and data stored in reg- 
ister B is carried out. The result thus obtained is stored into register C. Namely, aO*bO, a1 *b1 , a2*b2, a3*b3 respectively 
10 consisting of 32 bits are stored into register C. 

[01 14] Then, by instruction "PHADD D, CT, two data of high order of the register C are added to each other and two . 
data of low order thereof are added to each other to store them into register D. 

[0115] Then, instruction "PEXC E, D", the Most Significant data and the Least Significant data of the register D are 
caused to be as they are, and two data at the central portion therebetween are exchanged to store them into register E. 
15 [01 16] Then, by instruction "PHADD G, E", two data of high order of the register E are added to each other and two 
data of low order thereof are added to each other to store them into register G. Thus, aO*bO + a1 *b1 + a2*b2 + a3*b3 
are stored into the 

Most Significant 32 bits of the register G. 

20 

[0117] In this example, the portions labeled mark x of FIG. 27 indicate that values irrelevant to this operation are 
stored. 

[01 1 8] As stated above, in order to calculate inner product of two four-dimensional vectors, 5 steps were required as 
shown in FIG. i 1 in the conventional arithmetic unit (apparatus). On the contrary, in accordance with the arithmetic unit 

25 (apparatus) of this invention, such a calculation can be performed only by the four steps. 

[0119] FIG. 28 shows an example of the configuration of a picture preparation apparatus constituted with the arith- 
metic unit (apparatus) according to this invention having the above explained MM instructions. 
[0120] In FIG. 28, a CPU1 which is Central Processing Unit comprised of microprocessor, etc. serves to take out oper- 
ation information of an input device 4 such as input pad or joy stick etc. through an interface 3 and a main bus 9, and 

30 the arithmetic unit of this invention is used for this CPU1 . Further, the CPU1 sends, on the basis of the operation infor- 
mation thus taken out, information of three-dimensional picture stored in a main memory 2 which is first memory to a 
graphic processor 6 through the main bus 9. 

[0121] The graphic processor 6 serves to convert sent information of three-dimensional picture to generate picture 
data, and three-dimensional picture by picture data generated here is depicted on a video memory 5 which is second 
35 memory. Three-dimensional picture data depicted on this video memory 5 is read out at the time of scanning of video 
signal. Thus, three-dimensional picture is cfisplayed on display unit (not shown). 

[0122] Moreover, simultaneously with displaying three-dimensional picture as described above, speech (sound) infor- 
mation corresponding to the displayed three-dimensional picture within the operation information which has been taken 
out by the CPU1 is sent to an audio processor 7. The audio processor 7 displays, on the basis of this sent speech infor- 
40 mation, speech data stored in an audio memory 8. 

[0123] Such a picture preparation apparatus is used, e.g., in home game machines for which it is required to display 
three-dimensional picture with relatively high accuracy and at high speed. 

[0124] In the home game machines, as a method of displaying three-dimensional picture by using a picture prepara- 
tion apparatus as described above, the shading method of adding shade of object to be displayed and the texture map- 

45 ping of deforming any other two-dimensional picture to paste it are representative. 

[0125] Moreover, there are many instances where, as the coordinate system representing three dimensions, there 
are used object coordinate system for representing shape or dimension relating to three-dimensional object itself, world 
coordinate system indicating position of object when three-dimensional object is disposed in space, and screen coordi- 
nate system for representing three-dimensional object displayed on screen. There are many instances where particu- 

so larly, polygonal area serving as unit which represents three-dimensional picture of three-dimensional object on the 
screen coordinate system, which is so called polygon, is dealt as simplified triangular area. 

[01 26] The arithmetic unit (apparatus) according to this invention is suitable, with respect to such triangular area (pol- 
ygon), for calculating vertex coordinates, or carrying out inner product calculation, etc. of norma! vector and light source 
vector from attribute of object and light source data. 
55 [0127] In accordance with arithmetic apparatus as explained above, this apparatus is caused to be of the configura- 
tion further having, in addition to the conventional MM instructions, MM instructions having a function to perform oper- 
ation between plural fields within the same word of operational object (object to be computed). For this reason, it is 
possible to perform parallel operation at a high speed by the number of steps lesser than that of the prior art. 
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[0128] It is to be noted that this invention is not limited to the above-described embodiments, but it is a matter of 
course that, e.g., the number of bits of register and/or the number of bits of field are not limited to the numbers shown. 

Claims 

1. An arithmetic apparatus comprising; 

arithmetic and logic means for performing arithmetic and logical operation with respect to word subject to oper- 
ation, which is constituted with plural fields consisting of M bits (M £ 1 ); 

shift processing means for implementing shift operation by a predetermined number of bits with respect to the 
word subject to operation; and 

a register for storing the word subject to operation and word in which the operation has been carried out, 
wherein the apparatus has a function to perform parallel operation between the plural fields within the same 
word subject to operation. 

2. An arithmetic apparatus as set forth in claim 1 , 

wherein the arithmetic and logic means includes plural arithmetic and logic units for performing arithmetic and 
logical operation in units of the field with respect to data subject to operation, the shift processing means 
includes a shift processing unit for implementing shift operation by a predetermined number of bits in the field 
units with respect to data subject to operation, and the register includes plural register units for storing, in the 
field units, data subject to operation and data in which operation has been carried out. 

3. An arithmetic apparatus as set forth in claim 2. 

which further comprises field exchange means for carrying out exchange between (predetermined ones of) the 
fields within word subject to operation consisting of the plural fields. 

4. An arithmetic method for performing arithmetic and logical operation in field units with respect to word subject to 
operation constituted with plural fields consisting of M bits (M £ 1), 

the method including a step of exchanging two fields or more within the same word subject to operation. 

5. An arithmetic method as set forth in claim 4, 

wherein arithmetic and logical operation is carried out between the fields of word subject to operation in which 
field exchange has been carried out to store result of the operation into one of the fields subject to operation. 
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